Acoustic and Data-driven Features for Robust Speech Activity Detection
نویسندگان
چکیده
In this paper we evaluate different features for speech activity detection (SAD). Several signal processing techniques are used to derive acoustic features that capture attributes of speech useful in differentiating speech segments in noise. The acoustic features include short-term spectral features, long-term modulation features both derived using Frequency Domain Linear Prediction (FDLP), and joint spectro-temporal features extracted using 2D filters on a cortical representation of speech. Posteriors of speech and non-speech from a trained multi-layer perceptron are also used as data-driven features for this task. These feature extraction techniques form part of an elaborate feature extraction front-end where information spanning several hundreds of milliseconds of the signal are used along with heteroscedastic linear discriminant analysis for dimensionality reduction. Processed feature outputs from the proposed front-end are used to train SAD systems based on Gaussian mixture models for processing of speech from multiple languages transmitted over noisy radio communication channels under the ongoing DARPA Robust Automatic Transcription of Speech (RATS) program. The proposed front-end performs significantly better than standard acoustic feature extraction techniques in these noisy conditions.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملSpeech activity detection fusing acoustic phonetic and energy features
With the wider deployment of automatic speech recognition (ASR) systems, the importance of robust speech activity detection has been elevated both as a means of reducing bandwidth in client/server ASR and for overall system stability from barge-in through the recognition process. In this paper we investigate a novel technique for speech activity detection, that we have found to be effective in ...
متن کاملDeveloping a Speech Activity Detection System for the DARPA RATS Program
This paper describes the speech activity detection (SAD) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present two approaches to SAD, one based on Gaussian mixture models, and one based on multi-la...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کامل